Replication Code for Manuscript: 
Choma, E.F., Evans, J. S., Gomez-Ibanez, J. A., Di, Q., Schwartz, J., Hammitt, J. K., Spengler, J.D. (2021). Health benefits of decreases in on-road transportation emissions in the United States from 2008 to 2017. Accepted for publication at Proceedings of the National Academy of Sciences of the United States of America.

Author: Ernani F. Choma -- echoma@hsph.harvard.edu

The code contains three main directories -- each of which contains code, inputs and descriptions of where and how to download publicly-available data to run the code. Each directory can be run independently. For example, the outputs from the Marginal Damages Model (Results) are used as inputs for the Results with Emissions Code -- but duplicate files are provided in both directories.

The directories are
a) Results_With_Emissions
b) Marginal_Damages_Model
c) Preprocessing_of_Model_Input_Data

These directories contain 5 main parts of the code, including a detailed documentation file for each part:
1) Results With Emissions
Directory Location: Results_With_Emissions
Documentation File: Results_With_Emissions/Documentation_Results_With_Emissions_ReadMe.txt

2) Marginal Damages Model
Directory Location: Marginal_Damages_Model
Documentation File Locations: 
For the model: Marginal_Damages_Model/Documentation_Marginal_Damages_Model_ReadMe.txt
For the result data files: Marginal_Damages_Model/Results/Documentation_Marginal_Damages_Model_Results_ReadMe.txt

3) InMAP to County Population Mapping (inside Marginal Damages Model)
Directory Location: Marginal_Damages_Model/InMAP_to_County_Population_Mapping
Documentation File Location: Marginal_Damages_Model/InMAP_to_County_Population_Mapping/Documentation_Population_Mapping_ReadMe.txt

4) Preprocessing of CDC/HHS Mortality and Population Data (inside Preprocessing of Model Input Data)
Directory Location: Preprocessing_of_Model_Input_Data/CDC_HHS_Mortality
Documentation File Location: Preprocessing_of_Model_Input_Data/Documentation_Preprocessing_of_Model_Input_Data_ReadMe.txt

5) Preprocessing of CDC/HHS Mortality and Population Data (inside Preprocessing of Model Input Data)
Directory Location: Preprocessing_of_Model_Input_Data/NEI
Documentation File Location: Preprocessing_of_Model_Input_Data/Documentation_Preprocessing_of_Model_Input_Data_ReadMe.txt


------------

NOTES: 
- All model files that are provided as matrices/arrays where the 3,108 rows and/or columns represent each of the 3,108 U.S. Counties in our study but do not contain identifiers or row names (e.g., The Marginal Damages Model Result files), follow the same ordering of U.S. counties found in the FIPS State + County Code List in file: Results_With_Emissions/Inputs/Auxiliary/STCOUList.csv
- All instances of 'tonne' refer to metric ton, unless 'short ton' is otherwise indicated.


------------

A brief summary and a description of the required data downloads is provided below:

------------

1) Results_With_Emissions
This directory contains the R code to produce the final results in Choma et al. (2021), including Figures 1-4 and S1-S10 and supplemental datasets S1-S5. It combines the outputs from our marginal damages model with on-road transportation emissions from EPA's National Emissions Inventory.

In order to run the code and reproduce our final results, the following downloads are required:

a) County Data from the US Census Bureau
Source: U.S. Census Bureau, 2020
Dataset name: 'co-est2019-alldata.csv'
Available at: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv
Accessed on May 03, 2021
Data Last Modified, according to US Census Bureau: 03/26/2020 09:40
File Size: 3.5M

b) Metropolitan Area data from the U.S. Census Bureau
Data Source: U.S. Census Bureau, 2020 
Dataset name: 'cbsa-est2019-alldata.csv'
Available at: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.csv
Accessed: May 03, 2021
Data Last Modified, according to US Census Bureau: 03/26/2020 09:41
File Size: 1.2M

c) U.S. Census Bureau Cartographic Boundary File for U.S. counties at 1:500,000 resolution level (required to plot Figure 4 only)
Data Source: U.S. Census Bureau
Available at: https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_county_500k.zip
Accessed: May 04, 2021
Data Last Modified, according to US Census Bureau: 05/01/2020 13:27
File Size: 11M

------------

2) Marginal_Damages_Model
This directory contains the R code to for the model to generate the marginal damages for emissions of each species under different CRFs and baseline data (baseline ambient PM2.5 levels and baseline mortality).

Outputs: We generate two sets of results:
(i) Marginal damages per tonne of emissions (Marginal Values -- MV), for each county. 

(ii) The source-receptor matrix (SRM) for marginal damages -- these are matrices MI described in Eq. 2 of the manuscript. 
They shows the marginal damages occurring in each county as a consequence of an emission of 1 tonne in each county. 
The rows are the sources and the columns are the receptors, so that the row sums for these matrix are equivalent to the total marginal damages (i).

For both (i) and (ii), the damages are calculated for a combination of 80 different combinations of pollutant species, CRFs, Baseline ambient PM2.5 levels, and Mortality data. These are:
* 5 pollutant species: (Primary PM2.5, SO2, NOX, NH3, and VOC
* 4 CRFs: GEMM (Burnett et al., 2018), Vodonos et al. (2018) Parametric, Vodonos et al. (2018) Spline, and Krewski et al. (2009).
* 2 Baseline ambient PM2.5 levels: 2008 and 2017
* 2 Baseline Mortality Data: 2008 and 2017
***ALL values are monetized damages (2017 USD) caused by emissions of 1 tonne (metric ton, or 10^6 grams) of a given species (depending on the dataset), in each source.

There is also a separate documentation file for the results, i.e. for the datasets containing the results of the Marginal Damages Model. This additional file is located at:
Marginal_Damages_Model/Results/Documentation_Marginal_Damages_Model_Results_ReadMe.txt.

Note: Outputs (i) and part of (ii) (i.e., the SRMs for 2017 ambient PM2.5 levels and 2017 baseline mortality) are used as inputs in the Results_With_Emissions code, and also provided as inputs in the Results_With_Emissions file.

In order to run the code and reproduce our marginal damages results, the following downloads are required:

a) InMAP Source-Receptor Matrix (ISRM)
Source: The ISRM is from the Goodkind et al. (2019) study
Goodkind, A.L., Tessum, C.W., Coggings, J.S., Hill, J.D., Marshall, J.D., 2019. Fine-scale damage estimates of particulate matter air pollution reveal opportunities for location-specific mitigation of emissions. Proc. of the Natl. Acad. of Sci. of the U. S. A 116, 8775-8780. https://doi.org/10.1073/pnas.1816102116.
This dataset is available at:
https://dx.doi.org/10.5281/zenodo.3590127
Accessed on: March 03, 2021
File: isrm_v1.2.1.zip
Version: 1.2.1 (March 11, 2019)
Size (.zip): 12.5 GB
Note: after unzipping, the file is 165.1 GB in size.

b) Vodonos Coefficients
Source: This dataset is for the Vodonos et al. (2018) study.
Vodonos, A, Abu Awad, Y., Schwartz, J., 2018. The concentration-response between long-term PM2.5 exposure and mortality; A meta-regression approach. Environ. Res. 166, 677-689. https://doi.org/10.1016/j.envres.2018.06.021.
This dataset is available at:
https://github.com/AlinaVod/meta_regression_pm2.5
Accessed on: March 09, 2021
Version: May 29, 2020
Size (.zip, all files): 352K


3) Marginal_Damages_Model/InMAP_to_County_Population_Mapping
This directory contains the R code to map InMAP cells to U.S. Counties. A summary description for that folder is provided in 2.1) below.


i) Output: One .RData file 'InMAP_County_Overlay.RData', containing a single object named InMAP.Cells.by.County. 
A detailed description is provided in the documentation file for the population mapping folder, but in summary:
InMAP.Cells.by.County is a large matrix (3,108 x 52,411), where the 3,108 rows are the 3,108 counties and the 52,411 columns are the 52,411 InMAP cells (from Goodkind et al., 2019 -- see a) below). 
Element Pij of this matrix (ith row and jth column) contains the percentage of population of county "i" in InMAP cell "j", i.e. 
Pij = Pj_in_i/Pi 
where Pj_in_i is the InMAP cell population that is located within the boundaries of county i; and Pi is the total population of county i.

Note:
This output is used in the main marginal damages model.
It is provided as an input in the main directory -- this object was collected and saved in the same file as the other inputs used in the Marginal Damages Model.

In order to run the code and reproduce our population mapping results, the following downloads are required:

a)InMAP source-receptor matrix (ISRM) cells data. 
Source: The ISRM is from the Goodkind et al. (2019) study
Goodkind, A.L., Tessum, C.W., Coggings, J.S., Hill, J.D., Marshall, J.D., 2019. Fine-scale damage estimates of particulate matter air pollution reveal opportunities for location-specific mitigation of emissions. Proc. of the Natl. Acad. of Sci. of the U. S. A 116, 8775-8780. https://doi.org/10.1073/pnas.1816102116.
This dataset is available at: https://dx.doi.org/10.5281/zenodo.3590127
Accessed March 03, 2021
File: 'marginal_values_updated_110819.zip'
Version: 1.2.1
File size: 12.9MB
Note: unlike the main model code, this code does not use/does not require the download of the full ISRM, only of this smaller file with summary cell data.

b) U.S. Census Bureau County Boundaries
This is the 2019 County TIGER/LINE shapefiles provided by the US Census Bureau
Dataset File: 'tl_2019_us_county.zip'
Available at: https://www2.census.gov/geo/tiger/TIGER2019/COUNTY/tl_2019_us_county.zip
Accessed March 05, 2021
File size: 76MB

c) U.S. Census Bureau Census Block Group Boundaries
These are the Block Group TIGER/LINE shapefiles provided by the US Census Bureau
Available as 56 .zip files in total one for each state/district/territory
Dataset files: 'tl_2019_[FIPS_ST_CODE]_bg.zip'
Where [FIPS_ST_CODE] is the FIPS state code for each state
Available at: https://www2.census.gov/geo/tiger/TIGER2019/BG/
Accessed March 05, 2021
File size: Approximately 640 MB in total (zipped). Over 1GB unzipped.


------------

4) Preprocessing_of_Model_Input_Data/CDC_HHS_Mortality
This directory contains the R code to read and process the original mortality data from the Centers for Disease Control and Prevention (CDC) and the U.S. Department of Health and Human Services (HHS) to be used in the model. 

Outputs:
For Mortality, one .RData file is provided. It contains 4 datasets with death counts by county, age group, and year. 
A detailed description is provided in the documentation of the Preprocessing_of_Model_Input_Data file.
Outputs are used as inputs in the Marginal_Damages_Model code, and also provided as inputs in the Marginal_Damages_Model file

Inputs are provided within this folder -- no additional downloads required.


------------

5) Preprocessing_of_Model_Input_Data/NEI
This directory contains the R code to read and process the original emissions data from the U.S. Environmental Protection Agency's National Emissions Inventories (NEI) 

Outputs:
For Emissions, four .RData files are provided -- one for each NEI (2008, 2011, 2014, and 2017). 
Each .RData file contains multiple datasets (two for 2008, three for 2011, 2014, and 2017). 
Outputs are used as inputs in the Results_With_Emissions code, and also provided as inputs in the Results_With_Emissions file

In order to run the code, the following downloads are required:

a) NEI 2008: 
[dataset] U.S. Environmental Protection Agency, 2018a. Data from “2008 National Emissions Inventory (NEI)”, 2008nei_v3.
Links for download:
a.1) Onroad emissions: 
Dataset file: '2008neiv3_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2008/data_summaries/2008neiv3_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-01-31 13:44 
File Size: 208 M
a.2) Supporting data:
Dataset file: '2008nei_supdata_4c.zip'
Available at: https://gaftp.epa.gov/air/nei/2008/doc/2008v3_supportingdata/2008nei_supdata_4c.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-01-31 12:43
File Size: 163 M 

b) NEI 2011: 
[dataset] U.S. Environmental Protection Agency, 2018b. Data from “2011 National Emissions Inventory (NEI)”, 2011nei_v2. 
Links for download:
b.1) Onroad emissions: 
Dataset file: '2011neiv2_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2011/data_summaries/2011v2/2011neiv2_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-02-01 05:27
File Size: 44 M
b.2) Supporting data:
Dataset file: '2011neiv2_supdata_or_VMT.zip'
Available at: https://gaftp.epa.gov/air/nei/2011/doc/2011v2_supportingdata/onroad/2011neiv2_supdata_or_VMT.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-02-21 08:17
File Size: 23 M 

c) NEI 2014: 
[dataset] U.S. Environmental Protection Agency, 2020a. Data from “2014 National Emissions Inventory (NEI)”, 2014nei_v2. 
Links for download:
c.1) Onroad emissions:
Dataset file: '2014neiv2_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2014/data_summaries/2014v2/2014neiv2_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2020-11-19 10:01
File Size: 52 M
c.2) Supporting data:
Dataset file: '2014v2_onroad_activity_final.zip'
Available at: https://gaftp.epa.gov/air/nei/2014/doc/2014v2_supportingdata/onroad/2014v2_onroad_activity_final.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2018-05-15 08:58
File Size: 35 M 

d) NEI 2017: 
[dataset] U.S. Environmental Protection Agency, 2020b. Data from “2017 National Emissions Inventory (NEI)”, 2017nei_v1 (Apr 2020). 
Links for download:
d.1) Onroad emissions:
Dataset file: '2017neiApr_onroad_byregions.zip'
Available at: https://gaftp.epa.gov/air/nei/2017/data_summaries/2017v1/2017neiApr_onroad_byregions.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2020-04-30 09:53
File Size: 48 M
d.2) Supporting data:
Dataset file: '2017NEI_onroad_activity_final.zip'
Available at: https://gaftp.epa.gov/air/nei/2017/doc/supporting_data/onroad/2017NEI_onroad_activity_final.zip
Accessed on April 30, 2021
Last modified according to EPA website: 2020-04-14 14:30
File Size: 22 M